A statistical method for revealing form-function relations in biological networks 
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Over the past decade, a number of researchers in systems biology have sought to relate the function 
of biological systems to their network-level descriptions — lists of the most important players and 
the pairwise interactions between them. Both for large networks (in which statistical analysis 
is often framed in terms of the abundance of repeated small subgraphs) and for small networks 
which can be analyzed in greater detail (or even synthesized in vivo and subjected to experiment), 
revealing the relationship between the topology of small subgraphs and their biological function has 
been a central goal. We here seek to pose this revelation as a statistical task, illustrated using a 
particular setup which has been constructed experimentally and for which parameterized models of 
transcriptional regulation have been studied extensively. The question "how does function follow 
form" is here mathematized by identifying which topological attributes correlate with the diverse 
possible information-processing tasks which a transcriptional regulatory network can realize. The 
resulting method reveals one form-function relationship which had earlier been predicted based on 
analytic results, and reveals a second for which we can provide an analytic interpretation. Resulting 
source code is distributed via http://formfunction.sourceforge.net. 



The observation that form constrains function in bi- 
ological systems has historical roots far older than sys- 
tems biology. Century-old examples include those made 
in D'Arcy Thompson's "On Growth and Form" [1] and 
the observation that the quick responses necessary for 
reflex actions such as heat- and pain-avoidance could be 
manifest only by a dedicated input-output relay circuit 
from fingers to brain and back [2] . Advances in synthetic 
biology, often requiring design of systems for which only 
topology can be specified without control over precise 
parameter values, has motivated a reintroduction of such 
topological thinking in biological systems [3-5] . A second 
source of such inquiry is high-throughput systems biol- 
ogy, in which technological advances provide topologies 
of large biological networks without precise knowledge 
of their interactions, dynamics, or possible naturally- 
occuring inputs [6-8] . Such limitations thwart our desire 
to learn form-function relations from data or to derive 
them from plausible first-principles modeling. Our goal 
here is to illustrate how re-framing the question as one of 
computation and statistical analysis allows a clear, quan- 
titative, interpretable approach. 



I. SETUP 

Mathematical progress requires clear definitions of 
terms, including, here, "form," "function," and "follow." 



To define the first two we must choose a specific experi- 
mental setup; we here choose one which has been exper- 
imentally realized repeatedly: that of a small, synthetic 
transcriptional regulatory network with "inducible pro- 
moters," meaning that the efficacy of the transcription 
factors may be diminished by introducing small inter- 
fering molecules [5, 9-11] (Figure 1A). A common "out- 
put" responding to the "input" presence of such small 
molecules is the abundance of inducible green fluores- 
cent protein (GFP), which provides an optical readout 
of one of the regulated genes. The "form," then, will be 
defined by the topology of such a small regulatory net- 
work, distinguishing between up- and down-regulatory 
edges in the network. 1 "Function" will be defined by the 
realizable input-output relations of a device with two bi- 
nary inputs (corresponding to presence or absence of the 
small molecules) and one real-valued output (the tran- 
scriptional level of the output gene) . 

Among other published experiments which correspond 
to this setup is that of Guet, et al. [9]. We re- 
mind the reader of two particularly noteworthy obser- 
vations of Guet and coworkers: (i) that many of their 
experimentally-realized small networks were "broken" in 
the sense that the output remained constant over the dif- 
ferent possible inputs; and (ii) that often the same topol- 
ogy can realize different input-output relations (or even 
be broken, i.e., can realize both a particular function as 
well as a lack of function entirely) . Within a mathemat- 
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1 We use —> to indicate up-regulation, H to indicate down- 
regulation, and — > to indicate regulation whose sign is not spec- 
ified; additionally we use ~-* to indicate inhibition by a small 
molecule. 
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FIG. 1: Network set and input-output functions. (A) Tran- 
scription factor A regulates the expression of transcription 
factor B, which regulates the expression of fluorescent pro- 
tein G. The efficacies of A and B are reduced by the pres- 
ence of chemical inhibitors, labeled by scaling factors x and y 
(Eqn. 1), respectively. We distinguish between up- and down- 
regulation and consider all ways in which regulatory edges 1, 
2, and 3 may appear, for a total of f60 networks. (B-C) 
Examples of non-XOR (B) and XOR (C) functions (see Sec. 
Ill A), as defined by the ranking of conditional probability 

distributions p(G\c), where c £ { , — + , -\ — , ++} describes 

whether each inhibitor is present (+) or absent (— ). Insets 
show mean protein number G in each of the four states. The 
functions in B and C are both realized by the particular net- 
work in A in which edge 3 is absent and all remaining edges 
are down-regulating. 



ical model, such behavior follows straightforwardly from 
considering the behavior of a given dynamical system at 
different points in the space of quantitative parameters 
[12]. Revealing such degeneracy of functions by exploring 
the parameter space given a topology (and given an al- 
gebraic expression modeling the regulatory interactions 
among the genes) may be recast as one of optimizing 
— locally in parameter space — the mutual information 
between input and output over this space [13-15]. Mu- 
tual information (MI) as a cost function is advantageous 
both biologically (in that many natural systems including 
transcriptional regulatory networks are known to operate 
near their information-optimal constraints [16-18]) and 
mathematically (in that by optimizing MI we can identify 
parameter settings which are functional without demand- 
ing in advance the particular input-output functions we 
seek). That is, we optimize for functionality rather than 
for a specific predetermined function. 

MI between input and output is defined as a func- 
tional of the joint distribution p(c, G) — the probability 
of a (here, categorically-valued) setting of the chemistry 
and the (here, real-valued) concentration of the output 
gene (see Sec. II). The stochastic relationship between 
input in output in biological networks has many under- 



lying sources; however, one source is intrinsic: the fi- 
nite copy number of the constituent proteins introduces 
a "shot noise," much discussed in the systems biology 
literature [19-23]. Particular additional sources of noise 
may thwart information processing as well in specific sys- 
tems; hoping to remain as general as possible, we will 
here consider only intrinsic noise. In this respect, we aim 
to describe the functional response(s) of single cells, as 
opposed to an averaged response of a pooled population 
of cells. 

Having defined form and function, we must define "fol- 
low." Here we will need a measure of, for any topological 
feature of the networks, the extent to which the value 
of the feature (e.g., length of a cycle, number of down- 
regulations, etc.) does or docs not correlate with the 
input-output relations the network can perform. Since 
we wish to correlate two categorical (rather than real- 
valued) features, MI is again useful, in that we will rank 
topological features by the information between its value 
and the particular input-output relations networks with 
that feature are found to perform. 

To summarize: "function" is given by the possible 
input-output relations of which a given topology is ca- 
pable; these are found by numerical optimization over 
parameters of the MI between input and output, with the 
probabilistic description of the transcriptional output set 
by intrinsic noise; "form" is mathematized by enumerat- 
ing a list of topological attributes descriptive of our small 
networks. The question "how does function follow form" 
is here recast as a ranking of topological features based on 
the correlation (here, given by MI) between topological 
feature value and the particular input-output relations 
realizable for a given topology. 



II. METHOD 

The method is illustrated in Fig. 2, which makes clear 
how the (general) search for form-function relations may 
be composed into separate tasks, the implementation of 
which is particular to one particular experimental setup. 
For example, the particular elements of the network set 
f2„ follow from system-specific choices made below, but 
the design of the task (Fig. 2) can be applied to relat- 
ing form and function more generally. For a different 
experimental context, one or more of the "modules" in 
the design of the algorithm may need to be replaced, but 
the design itself we anticipate will be useful to revealing 
form-function relations in a wide variety of contexts. 

Below we describe the method in detail as applied to 
our particular experimental setup; further information on 
networks, gene regulation, linear noise analysis, informa- 
tion theory, and optimization is provided in Sec. V. 
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FIG. 2: Outline of the procedure used to ask "how does func- 
tion follow form." Variables are defined in text. 



A. Networks and features 

We consider the simplest set of networks with two 
chemical inputs and one genetic output (Fig. 1A). Each 
of two chemical inhibitors is either present or absent, giv- 
ing four possible input states. When present, the chem- 
icals respectively inhibit two transcription factors, the 
second of which regulates the fluorescent output. We 
consider all ways in which each of the transcription fac- 
tors may regulate itself and the other (with the constraint 
that neither is unregulated) and distinguish between up- 
and down- regulation, giving a total of 160 networks (the 
combinatorial accounting is presented in Sec. VA). The 
topology of each network n e {1,2,... 160} constrains 
the model parameters d to lie within the particular fea- 
sibility set ft n . Having defined the set of networks, we 
enumerate topological features and their values (here, 
yti G {1,2,... 17}; see Table I). 



where o = {A/x,A} when the first inhibitor is {present, 
absent}, b = {B/y,B} when the second inhibitor is 
{present, absent}, and the Rj are degradation rates 
(j € {A, B,G}). The parameters x > 1 and y > 1 
model the effect of the interfering small molecules by 
reducing the effective concentrations of the proteins. 
This gives a total of four chemical input states denoted 

c e { , — h,H — ,++}, each state describing whether 

each of the two inhibitors is present (+) or absent (— ). 
The terms ipj describe the transcriptional regulation of 
each species by its parent (s) and are formulated under 
a statistical mechanical model [24-26]. The statistical 
mechanical approach to modeling transcription is prin- 
cipled, compact, and in the case of combinatorial reg- 
ulation [24] captures the diversity of multidimensional 
reponses observed in experimental systems [27-29]. Full 
algebraic forms of the ipj are dependent on topology, in- 
cluding, in the case of combinatorial regulation, whether 
the transcription factor interaction is additive or multi- 
plicative (see Sec. VB1). 

The stochastic description of each network is set by 
intrinsic noise. We obtain probability distributions over 
protein numbers using the linear noise approximation 
(LNA) [20, 30, 31], since, in contrast to simulation 
techniques [19], the LNA does not rely on sampling and 
is thus much more computationally efficient (making 
many-parameter optimization feasible). Under the 
LNA the steady state distribution over each species' 
protein number is a Gaussian expansion around the 
deterministic mean given by the steady state of Eqn. 
1. The covariancc matrix S under the LNA is de- 
termined from model parameters by (numerically) 
solving the Lyapunov equation JS + EJ T + D = 0, 
where J is the Jacobian of the system in Eqn. 1 and 
D = dmg{R A (cp A + A),R B (ip B + B),R G {ip G + G)} is 
an effective diffusion matrix. Of particular importance 
are the distributions p(G\c), the stochastic response of 
the output species G given that the system is in each 
of the four input states c. The input-output MI may 
be computed directly from this quantity, I[p(c, G)] = 
£ c / dG P (G\c)p(c) \og[p{G\c)/ £ c , P (G\c')p(c% with 
the provision that the input states are equally likely, 
p(c) = 1/4. 



B. Regulatory model 



C. Input-output functions 



The mean protein numbers of the two transcription 
factors A and B and the fluorescent output G are de- 
scribed by the deterministic dynamics 



J_dA 
R~^~dt 

1 dB 
R~B~dt 

1 dG 
R~^~dt 



<PA(a,b) - A, 
ip B (a,b) - B, 
Va{b)-G, 



(1) 



The possible input-output responses of each network 
are found by locally optimizing MI in parameter space. 
The optimization is done numerically using MATLAB's 
fminscarch and initialized by sampling uniform-randomly 
in the logs of the parameters (specific bounds from which 
initial parameters are sampled are given in Table II). The 
optimization is performed at constrained average protein 
number N = (A+B+G)/3 and average timescale separa- 
tion T = {(Ra + Rb)/2]/Rg by maximizing the quantity 
L = I — r)N — kT for various values of the Lagrange 
multipliers rj and k (Rq is fixed). 
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Optimization of MI has the effect of increasing the 
separation among the distributions p(G\c) (see Fig. 1B- 
C). To reflect the fact that many observed regulatory 
networks are known to operate near their information- 
optimal limits [16-18], we use in the statistical analysis 
only those optimal solutions whose MI lies above a cut- 
off value. Choosing a cutoff larger than 1.5 bits (which 
corresponds to two fully separated distributions and two 
fully overlapping distributions) ensures that the means 
of the distributions are fully resolved, and thus allows 
one to define the function performed by the network as 
the ranking r of the means of the distributions p(G\c) 
along the G axis. For the results in this study we use a 
cutoff of 1.55 bits. The method can be easily extended 
to include less informative locally optimal functions, e.g. 
binary logic gates such as an AND function, by using a 
lower MI cutoff and generalizing the definition of func- 
tion as ranking [14]. Fig. 1B-C shows examples of two 
different functions performed by the same network that 
are local optima in MI at different points in parameter 
space; they correspond to r = 1 and r = 9 on the hori- 
zontal axis of Fig. 5, respectively. 

To correct for repeated sampling of the same local opti- 
mum at close but numerically distinct points in the real- 
valued parameter space, nearest-neighbor optima per- 
forming the same function are merged. This enforces 
that the distribution of optimal parameters is sampled 
uniformly. The robustness of subsequent results to this 
choice is tested numerically (see Sec. IIIB). 



D. Correlating feature value and function 

For each topological feature /z, the correlation between 
feature value and function r is computed from the 
joint probability distribution r). This distribution 

is defined by the optimization data and the factorization 



p(Vfj), to yield a statistic 



H[p(v,)} 



(3) 



that ranges from (when the function provides no infor- 
mation about the feature value) to 1 (when the feature 
value is exactly determined by the function). 



III. RESULTS 
A. Non-parametric analysis 

We first present an analysis which requires no assump- 
tions about what is a "flat" distribution in parameter 
space, i.e., we simply enumerate how many networks are 
capable of a given degeneracy of input-output functions, 
and how many input-output functions may be realized 
by a given multiplicity of networks (Fig. 3). This anal- 
ysis recovers an intuitive result (Fig. 3A): that "XOR" 
functions, in which the sign of the influence of one input 
depends on the value of the other, are more difficult to re- 
alize (i.e. they are observed in fewer networks). Simpler 
functions, in which the influence of one input remains 
positive or negative irrespective of the value of the other, 
are easier to realize. The analysis also reveals that each 
network can perform at least two functions (Fig. 3B). 
These are the the two functions consistent with the signs 
of the forward regulatory edges A — > B and B — > G, 
as described in detail in Sec. HID. Since the topology 
A — > B — > G is that obtained in the parametric limit 
when the feedback edges are of negligible strength, it is 
clear that these functions must be realizable; Fig. 3B 
shows further that these functions are sufficiently infor- 
mative to be observed as information-optima. 



B. Topological features and robustness 



P( v ^ r ) = ^P(v»,r,ti,ri) = '^2p(v li ,r\'&,ri)p(d,ri) 
= ^P(v»\n)p(r\&,ri)p('&\ri)p(ri), (2) 

where ■& runs over all points in parameter space at which 
an optimum is found. Here p{v^\n) is {0, 1}- valued, set 
by whether network n has value v for feature fi; and 
p(r\n, i?) is {0, l}-valued, set by whether network n per- 
forms function r at point #, according to the optimization 
data. The distributions p(d\n) and p(n) are assumed to 
be "flat," i.e. p{d\n) = l/|i?|„ and p(n) = l/\n\, where 
|i?|„ is the number of distinct local optima in parameter 
space for network n, and |n| is the number of networks; 
the robustness of subsequent results to weakening either 
of these assumptions is tested numerically (see Sec. Ill B) . 

The correlation between feature value and function 
is computed as their MI, normalized by the entropy of 



Table I ranks the topological features by p, which mea- 
sures how uniquely form determines function. Recall that 
in computing p from p(u M ,r) we assume that the distri- 
butions p(n) and p(i?|n) are both uniform (Eqn. 2); we 
find that the ranking in Table I is robust to deviations of 
both distributions from uniformity, as demonstrated by 
the following numerical experiments. 

The uniformity of p(n) is perturbed by artificially set- 
ting p(n) cx (u n ) e , where u n is a vector of random num- 
bers and e tunes the entropy of the distribution, i.e. 
e = recovers the maximum-entropy (uniform) distribu- 
tion, while e — > oo produces the zero-entropy distribution 
p(n) = 1 <^=> n = argmax (u n ). We find that the ranking 
of the top 4 features is preserved under <~15% pertur- 
bations in the entropy, and that the ranking of the top 
3 features is preserved under ^30% perturbations (sec 
Fig. 7A). This demonstrates that the feature ranking is 
considerably robust to perturbations in the uniformity of 
p(n). 
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FIG. 3: Non-parametric analysis of network functionality. (A) 
Histogram showing how many networks can perform each 
input-output function. Functions are numbered along the 
horizontal axis as in Fig. 5. (B) Histogram showing how many 
input-output functions may be realized by each network. The 
order of networks along the horizontal axis is determined by 
ranking according to number of functions realized. 



Topological feature 

1. Sign of forward edges 

2. Number of up-regulating edges 

3. Number of down-regulating edges 

4. Sign of autoregulation of species B 

5. Number of positive feedback cycles 

6. Number of negative feedback cycles 

7. Nesting structure of feedback cycles 

8. Type of interaction of edges into B 

9. Number of edges 

10. Number of feedback cycles 

11. Sign of A-B feedback cycle 

12. Number of additive interactions 

13. Type of interaction of edges into A 

14. Number of nested feedback cycles 

15. Sign of autoregulation of species A 

16. Number of multiplicative interactions 

17. Sign of edge B -»■ A 



P 

0.92366 
0.18249 
0.18098 
0.03228 
0.01133 
0.01109 
0.00418 
0.00288 
0.00184 
0.00184 
0.00173 
0.00141 
0.00085 
0.00081 
0.00058 
0.00048 
0.00021 



TABLE I: Topological features ranked by correlation measure 
P- 



The uniformity of p{&\n) is perturbed similarly, and we 
find that the ranking of the top 7 features is preserved 
under ~40% perturbations in the entropy of p{d) (see 
Fig. 7B). In this case we also have an independent en- 
tropy scale, given by the fact that we may decompose 
p("d\n) as p($|n) = p('0\'&o 1 n)p( j do\ n )j where ??o is 
the parameter setting that initializes an optimization and 
p($\$o, n ) is determined by the optimization itself. If 
we assume uniformity of p(§ \n), instead of p(t?|n), then 
p(-d\n) is computable from the numbers of times the op- 



timization converges repeatedly on each local optimum 
d. The entropy in this case is 13% different from that of 
the uniform distribution, and the ranking of p is almost 
entirely unchanged (Fig. 7B). This observation demon- 
strates that the results are not sensitive to whether one 
takes the distribution of initial parameters or the distri- 
bution of optimal parameters to be uniform. 



C. Non-redundant features 

Many topological features are not independent; for 
example, the feature "number of up-rcgulating edges" 
is highly correlated with "number of down-regulating 
edges." To interpret which features are associated with 
which sets of realizable functions, it is useful to group 
nearly identical features together and use only the fea- 
ture which is most informative about function (highest 
in p) as the exemplar among each group. To quan- 
tify redundancy among features, we compute the MI be- 
tween each pair of features and normalize by the min- 
imum entropy to produce a weighted adjacency matrix 
= /[p^tv)]/ nm^iffp^)], H\p(v v )]}, which we 
then use as the basis for unidimensional scaling [32] (see 
Sec. VF). 

Fig. 4 plots features' p values against the unidimen- 
sional scaling coordinate, revealing two distinct groups 
of highly informative features. The first, which includes 
the features ranked 1, 2, 3, 5, and 6, is dominated by fea- 
ture 1: the signs (up- or down-regulating) of the forward 
regulatory edges A — > B and B — > G. The second, which 
includes the features ranked 4, 8, 11, and 12, is dominated 
by feature 4: the sign (up-regulating, down-regulating, 
or absent) of the autoregulation of species B. The high 
information content of each of these two features is re- 
vealed visually by inspection of the conditional distribu- 
tion p(r\v^) — p{v^,r) /p(v^) (Fig. 5), as described in 
detail in the next sections. The functional importance of 
both of these features is supported by analytic results; 
for the first feature these analytic predictions were made 
in earlier work [14] and are recalled here, while for the 
second feature new analytic results are derived here. 



D. Forward regulation 

The topological feature that is most informative of net- 
work function is feature 1: the signs of the forward reg- 
ulatory edges A — > B and B — > G. Inspection of the 
feature value-function conditional distribution p(r\v) in 
Fig. 5A reveals a rich, highly organized (and thus highly 
informative) structure which we here interpret. 

The most prominent aspect of the probability matrix 
in Fig. 5A is the high-probability double-diagonal span- 
ning the eight non-XOR functions (i.e. functions 1 and 
2 are most often performed by networks with the first 
feature value, functions 3 and 4 the second feature value, 
functions 5 and 6 the third feature value, and functions 7 



6 




0.1 0.2 0.3 0.4 0.5 



Unidimensional scaling coordinate 

FIG. 4: Identifying feature redundancy. Correlation mea- 
sure p is plotted against a unidimensional scaling coordinate 
which groups similar features together (i.e. the components of 
the eigenvector corresponding to the largest-magnitude eigen- 
value of the feature adjacency matrix M M „). Features are 
numbered by rank (Table I). 
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FIG. 5: Conditional distributions showing the probability of 
a particular input-output function given the value of a topo- 
logical feature, for two features: (A) the signs (up- or down- 
regulating) of the forward regulations, and (B) the sign (up- 
regulating, down-regulating, or absent) of the autoregulation 
of species B. 



and 8 the fourth feature value). These are the functions 
one would expect by looking at the forward edges alone, 
i.e. in the absence of feedback. For example, in networks 
with the last feature value, A — > B — > G, inhibition of 
A and of B will both reduce the expression of G, such 
that the state in which both small molecules are present 
(H — h) produces the lowest-ranked output, and conversely, 

the state in which both small molecules are absent ( ) 

produces the highest-ranked output; functions 7 and 8 
are the two that satisfy these criteria. In previous work 
[14] we termed these functions "direct," and we showed 
analytically that networks are limited to direct functions 
even when feedback is added, so long as each species is 
regulated by at most one other species. This fact is val- 
idated here numerically: a plot of p(v\r) for only those 
networks in our set in which each species is regulated 
by one other species shows nonzero entries only for the 
direct functions (Fig. 9). 

Among all networks, including those with combinato- 
rial feedback (i.e. two edges impinging on one node), we 
see that direct functions still dominate, indicated by the 
bright double-diagonal in Fig. 5A. Networks with com- 
binatorial feedback perform other functions as well, but 
more rarely; examples include those functions in Fig. 5A 
above and below the double-diagonal and XOR functions 
9-12. The performance of these additional functions re- 
mains well organized by feature value, which makes the 
signs of the forward edges a highly informative feature. 



E. Autoregulation of B 

Other than feature 1 (and the features highly corre- 
lated with feature 1), the most informative feature is 
feature 4: the autoregulation of species B. Inspection 
of its feature value-function distribution (Fig. 5B) re- 
veals that the information content lies in the ability to 
perform XOR functions. Specifically, networks in which 
B is autoregulatcd are observed to perform XOR func- 
tions, while networks in which B is not autoregulated 
are not observed to perform XOR functions. Indeed, au- 
toregulation has been observed to enhance the functional 
response to multiple inputs in a related study in the con- 
text of Boolean logic gates [33] . 

XOR functions are those in which the sign of the in- 
fluence of one input depends on the value of the other; 
mathematically they satisfy one or both of two proper- 
ties: 

XOR property I : sign (dG/dx} depends on y, 
XOR property II : sign (dG/dy) depends on ir, 

The four observed XOR functions satisfy property I (e.g. 
Fig. 1C inset); no functions satisfying property II are 
observed (Fig. 5B). Analytic support for these facts is 
obtained by calculating dG/dx and dG/dy, respectively. 

To understand why autoregulation of B is necessary for 
XOR functions satisfying property I, we calculate dG/dx 
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analytically. We obtain (see Sec. VH1) 

dG _ 1 da dip B dtp G 

dx ~ -A dx da dB ' U 

where A_ = (d<p A /dB)(d(p B /dA) - [(dtp A /dA) - 
l][(dfB/dB) — 1] is the determinant of the Jacobian 
of the dynamical system in Eqn. 1 and is always neg- 
ative for stable fixed points. Eqn. 4 has an intuitive form 
when considering the direct path from x to G (Fig. 1A): 
the term da/dx = —A/x 2 < corresponds to the in- 
hibitory signal x A, the term d<p b / 'da corresponds to 
the regulatory edge A — > B, and the term dipc/dB cor- 
responds to the regulatory edge B — > G. Since G has 
only one regulatory input, pc(b) is monotonic, making 
dipo/dB = {d(pa/db){db/dB) = (dpa/db)/y of unique 
sign. The same is true for dips /da when B has only 
one regulatory input (i.e. when B is not autoregulated) . 
However when B has more than one regulatory input (i.e. 
when B is autoregulated), the sign of dips /da can de- 
pend on y, allowing XOR property I. Specifically, under 
our regulatory model, when B is autoregulated, d(ps/da 
is the product of a positive term and a term quadratic in 
b = B/y that has positive roots for some parameter set- 
tings (see Sec. VH2). This analysis suggests inspection 
of the parameters themselves obtained via optimization; 
doing so, we observe that the vast majority of observed 
XOR functions result from optimal parameter values for 
which there exists a positive root in the range < B/y < 
~100, which is the range of protein numbers in which our 
optimal solutions lie (Fig. 1B-C). To summarize, non- 
monotonicity in the regulation of species B, which can 
occur only when B is autoregulated, produces the ob- 
served XOR functions. 

To understand why XOR functions satisfying property 
II are not observed, we calculate dG/dy analytically. We 
obtain (see Sec. VH1) 

dG _ _t_ / _ d<p A \ db dip G 

dy ~ -A \ dA J dy db ' U 

where as before the determinant A is always negative. 
The last two terms correspond to edges along the direct 
path from y to G, i.e. y B and B — > G respectively, 
and are of unique sign; the term in parentheses describes 
the effect of the upstream species A and feedback. In all 
optimal solutions the term in parentheses is observed to 
be positive, despite wide variations in the orders of mag- 
nitude of each of the optimal parameters across solutions. 
This observation is largely explained by a stability anal- 
ysis: for four of the six possible topological classes of 
networks (those in which A is singly, not doubly, regu- 
lated; Fig. 1A), stability of a fixed point of Eqn. 1 implies 
that the term in parentheses is positive; for the other 
two topological classes, stability implies that this term is 
greater than a quantity that is zero for some parameter 
settings and of unknown sign for others (see Sec. VH3). 
In this last case it is unclear whether negative values of 
this term are analytically forbidden or simply exceedingly 



unlikely given the regulatory model and the space of op- 
timal solutions. Empirically this term is always positive, 
and typc-II XOR functions are not observed. 

The necessity of both forward regulation and autoregu- 
lation of species B for the performance of XOR functions 
highlights the importance of combinatorial regulation in 
functional versatility. As previously mentioned, networks 
without combinatorial regulation are limited to a partic- 
ular class of functions which does not include XOR func- 
tions [14]. Moreover, in the original experiment of Guet 
et al. [9], each species was singly regulated, and accord- 
ingly no XOR functions were observed. 

IV. DISCUSSION 

Both in order to assign functional significance to ob- 
served small network topologies in nature, and to design 
synthetic networks which will execute a desired function 
or set of functions, it is useful to develop a systematic 
approach for revealing the extent to which the form of a 
small network guides or constrains its functions. Resort- 
ing to hypothesized functions may be appealing in terms 
of interpret ability, but this strategy risks overemphasiz- 
ing those functions which one is looking for, or overlook- 
ing an unexpected function entirely. 

The statistical analysis, along with the analytic re- 
sults presented above, illustrate how the search for form- 
function relations can be posed as an algorithmic ap- 
proach leading to interpretable mechanisms. While 
we have illustrated it for a particular, experimentally- 
realized setup, the approach itself, subdivided into a set 
of distinct modules in Fig. 2, we anticipate will be appli- 
cable to a wide variety of biophysical contexts. Similarly, 
we have chosen a framework from which much can be 
discovered by analytic study of the deterministic dynam- 
ical system; other experimental setups may require vastly 
different analytic explanations, but the idea of using sta- 
tistical methods to highlight the features of paramount 
importance should be implementable as illustrated. We 
look forward to exploring the extent to which form does 
or does not follow function — and how — in related 
biophysical and biochemical models of small information 
processing networks in biology. 

V. APPENDIX 

The remainder of this paper contains details on the set 
of networks studied, the regulatory model, the optimiza- 
tion procedure, and the statistical analysis used to iden- 
tify exemplars among non-redundant groups of topologi- 
cal features. Additionally, it includes further background 
on information theory and the linear noise approxima- 
tion. Finally, it presents numerical and analytic results 
as referenced in the main text, including numerical tests 
of the robustness of results to uniformity assumptions 
made in the statistical analysis, statistical validation of 



an analytically derived functional constraint, and ana- 
lytic results on the ability of networks to perform XOR 
functions. 



A. Networks 

Many researchers over the past decades have begun 
to focus on the description of a given biological system 
not in terms of the isolated functions of its independent 
components, but in terms of the collective function of the 
network of interacting components as a whole. A central 
example of such a network is that describing interactions 
among genes: many genes produce proteins called tran- 
scription factors, whose role is to influence the protein 
production of other genes. The goal of the present study 
is, for a set of such networks, to develop a statistical 
method for determining the extent to which the topol- 
ogy of the network correlates with the function(s) it per- 
forms. In this section we describe the set of networks 
considered, including presentation of its combinatorics. 

We consider the set of networks consisting of two tran- 
scription factors A and B, each of which can be chemi- 
cally inhibited, each of which is regulated by itself, the 
other, or both, and one of which regulates the expression 
of a fluorescent output gene G (Fig. 1 A) . We distinguish 
between up- and down-regulating edges, and, in the case 
of more than one regulatory input edge, between addi- 
tive and multiplicative interaction of the transcription 
factors (see model in next section). This gives a set of 
160 networks, as described by the combinatorics here. 

There are six ways in which the three possible feed- 
back edges illustrated in Fig. 1A can appear such that 
species A remains regulated. In 2 configurations (Figs. 
6A-B), both A and B are singly regulated, and there are 
a total of 3 edges per configuration, each of which can 
be up- or down-regulating. The number of networks is 
thus [number of configurations] x [2 "(number of edges)] 
= [2] x [2( 3 )] = 16. 

In 3 configurations, (Fig. 6C-E), one of A and B is 
singly regulated while the other is doubly regulated. In 
the case of double regulation, the interaction between 
transcription factors is cither (i) additive, in which the 
signs (up or down) of the two regulatory edges are in- 
dependent, giving 4 possibilities, or (ii) multiplicative, 
in which the two regulatory edges have the same sign, 
giving 2 possibilities (see model in next section), for a 
total of 6 possibilities. The number of networks is thus 
[number of configurations] x [2 "(number of singly regu- 
lated nodes)] x [6" (number of doubly regulated nodes)] 
= [3] x [2( 2 )] x [6«] = 72. 

In the last configuration (Fig. 6F), both A and B 
are doubly regulated, and the number of networks is 
[number of configurations] x [2 "(number of singly regu- 
lated nodes)] x [6~ (number of doubly regulated nodes)] 
= [1] x [2«] x [6^] = 72. This gives a total of 
16 + 72 + 72 = 160 networks. 
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FIG. 6: The six possible topological configurations of the net- 
works studied. 



B. Gene regulation 



A regulatory network is most simply described using a 
dynamical system whose degrees of freedom are the con- 
centrations of its constituent proteins. The nature of the 
interactions is then determined by the functional form of 
regulatory production terms on the righthand side of the 
dynamical system. In this study we formulate produc- 
tion terms under a statistical mechanical model [24-26] 
that has been shown to capture diverse functionality in 
the case of combinatorial regulation [24] . 

Because many proteins are present in cells in as few as 
tens or hundreds of copies [34], a deterministic dynam- 
ical system, which ignores intrinsic fluctuations around 
mean protein concentrations, can provide an insufficient 
description of the biological system when copy numbers 
are low. Ideally one instead seeks to solve the master 
equation, which describes the dynamics of the probabil- 
ity of observing given protein numbers. Unfortunately, 
almost all master equations describing regulatory net- 
works, including the set of small networks we study here, 
arc intractable analytically. As a result, several tech- 
niques to simulate [19] or approximate [31] the master 
equation have been developed; we here employ the lin- 
ear noise approximation (LNA) . Since the LNA does not 
rely on sampling (in contrast to simulation techniques), 
it is computationally efficient, which makes feasible the 
many-parameter optimization performed in this study. 

The LNA is a second-order expansion of the master 
equation, with the first-order terms recovering the deter- 
ministic description. Therefore, in the present section, 
we first describe in detail the regulatory model used in 
the deterministic system; then we describe the LNA and 
its application to our networks. 
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1. Deterministic description: Dynamical system 

As in Eqn. 1, the mean protein numbers of the 
three species are described by the following determin- 
istic dynamics (for notational brevity the bars have been 
dropped — i.e. the quantities A, B, and G have been 
changed to A, B, and G — and the regulation terms Lp Ai 
ifB, and ipc have been relabeled a, /3, and 7, respec- 
tively) : 



For our networks, the regulation terms a, /?, and 7 are 



1 dA 

R^^dt 
1 dB 
R^lt 
J_dG 
R~^~dt 



a(a, b) — A, 
0{a,b)-B, 
7(6) - G, 



(6) 
(7) 
(8) 



where a = {A/x,A} when the first inhibitor is {present, 
absent}, b = {B/y,B} when the second inhibitor is 
{present, absent}, and the Rj are degradation rates 
(j € {^4, B, G}). The parameters x > 1 and y > 1 incor- 
porate the inhibition by reducing the effective concentra- 
tions of the proteins. This gives a total of four chemical 
input states c e { , — h, H — , ++}, each state describ- 
ing whether each of the two inhibitors is present (+) or 
absent (— ). 

a. Stability Steady state solutions of the dynamical 
system in Eqns. 6-8 are stable fixed points, i.e. points at 
which the all eigenvalues of the Jacobian have negative 
real part. The Jacobian of the system is 



J = 



dA 



1 



V 



dA 





da 
dB 

d§_ _ 

dB 

dB 



Its eigenvalues are 



Al,2 — 



da 
OA 



d/3_ 
dB 



2± 






-V 



dB J 



(9) 



da df3_ 
dBdA 



(10) 

and A3 = —1. At stable fixed points Rea^Ai^} < 0; 
equivalently the determinant of Eqn. 9, which is real and 
equal to the product of the eigenvalues, must be negative: 



det(J) = 
da dfi 
dBdA 



A1A2 A3 
' da 

dA 



- 1 



d/3 
dB 



-1 <0. (11) 



b. Regulation terms The regulatory model is that 
of Buchler et al. [24], in which the protein production 
is proportional to the probability that the RNA poly- 
merase (RNAp) is bound to the promoter of interest. 
This probability is formulated thermodynamically, i.e. by 
enumerating and statistically weighting all ways in which 
transcription can and cannot occur. 



sa 



a{a,b) = —p A (a,b), 



0(a,b) 



SB 

Rb 



PB(a,b), 



7(6) = ^PGf(&), 



(12) 
(13) 
(14) 



where the Sj are promoter strengths, and the probabili- 
ties of transcription pj are given by 



Pi 



on 



(15) 



oft 



The partition functions Z^ n and Z ] oS describe all the ways 
that transcription can occur and not occur, respectively, 
at the promoter region of gene j. As presented in detail 
below, the partition functions are determined by network 
topology and depend on additional parameters, including 
interaction strengths w, binding constants K, and "leak- 
iness" q (i.e. the statistical weight given to bare binding 
of the RNAp). All parameters are positive. We first of- 
fer qualitative interpretation of the parameters, then we 
present the detailed expressions for the partition func- 
tions. 

The w describe the interaction strengths between tran- 
scription factors, or between a transcription factor and 
the RNAp. Alphabetical superscripts refer to the pro- 
moter regions of genes A, B, or G, while numerical sub- 
scripts refer to the molecules involved in the interaction: 
RNAp (0), transcription factor A (1), or transcription 
factor B (2). For example, Wq X describes the interaction 
between RNAp and transcription factor A at the pro- 
moter region of gene B. 

The signs of the logs of the w J oi determine the signs of 
the edges (where j € {^4, B, G} for the three promoter 
regions and i € {1, 2} for transcription factors A and B). 
For example, Wq X describes the regulation of species B 
by species A; if logu^ > then the edge A — > B is 
up-regulating, and if log < then the edge A — > B 
is down-regulating. 

Following the model of Buchler et al. [24], when two 
transcription factors regulate one species, they may do so 
"independently" or "synergistically." Independent regu- 
lation corresponds to the case when both transcription 
factors interact with the same domain of the RNAp; 
mathematically the interaction strengths are additive 



( w oi 



2 ). If the RNAp has several interaction do- 



mains, two transcription factors can interact synergis- 
tically, and the interaction strengths are multiplicative 
{woiWq 2 )- Synergistic regulation implies the additional 
constraint that the regulatory effects of the two transcrip- 
tion factors (i.e. the logs of the interaction strengths) are 
of the same sign. 

The K describe the binding constants of each tran- 
scription factor to each promoter region. They have 
super- and subscripts similar to the to, e.g. K~f describes 
the binding of transcription factor A to the promoter re- 
gion of gene B. 
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The partition functions Z^ n ^ oS for the network topolo- 
gies shown in Fig. 6 are presented below. To provide 
intuition, the first two expressions are interpreted in de- 
tail. 

Topology A is shown in Fig. 6A; its partition functions 
are 



are 



Z A 


A a 

= (1 -4- 7/?Ai CI 


(16) 


7 A 


a 

= 1+ w 


(17) 


7 B 
^on 


B a 


(18) 


yB 


a 

= 1 + w 


(19) 


7 G 
^on 


G b 


(20) 


7 G 


= 1+ w 


(21) 



Eqn. 16 describes the two ways in which transcription can 
occur at the promoter region of gene A: (i) the RNAp 
can bind unassisted, with statistical weight q, or (ii) since 
A is self-regulating, the RNAp can bind upon interaction 
with transcription factor A, with weight proportional to 
q for the RNAp, to the effective concentration a scaled by 
the binding constant K A for transcription factor A, and 
to the interaction strength Wqi between the RNAp and 
transcription factor A. Eqn. 17 describes the two ways in 
which transcription can not occur at the promoter region 
of gene A: (i) there can be nothing bound, an outcome 
whose statistical weight we are free, via the normaliza- 
tion enforced in Eqn. 15, to set, and so we set to 1, and 
(ii) the transcription factor alone can bind, with weight 
a/Ki. Eqns. 18-21 are similarly derived according to the 
topology of the network. 

Topology B is shown in Fig. 6B; its partition functions 
are 



7 A 



7 A 



1 + W 0l9j^A- 



1 + 



(28) 
(29) 



q + w Q1 q 



K? 

B , „„-B\„ B 



+ (w 01 +w 02 )w 12 qj^j^ (additive), (30) 
b b 

, L oitfLn- 

„B „„B „„B 



w oi9- 



:'/ (multiplicative), (31) 

K i K 2 
^ a b 

G b 

q + w% 2 q- 



w 



12 iff K 2 ' 



(32) 
(33) 
(34) 



Topology D is shown in Fig. 6D; its partition functions 
are 



7 a 



A a a b 

a b 



Zt = 



+(w 01 + w 02 )w 12 q^j (additive) 
A a a b 

9 + ^019^4+^029^1 



(35) 











AAA CI b 

+w 01 w 02 w 12 qj^jj^ 


7 A 
Z on 


A b 


(22) 


7 A 


a b A 


7 A 


b 

= 1+ w 


(23) 


7 B 
^on 


R a 


7 B 
^on 


B a 

= 9 + w oi9-^b, 


(24) 


7 B 


a 

= 1+ Kj> 


7 B 


a 

= 1 + W 


(25) 


7 G 
Z on 


G b 


7 G 
^on 


G b 

= q + w^q-^G, 


(26) 


7 G 
^off 


= 1+ w 


7 G 


= 1+ w 


(27) 







a b 



(37) 
(38) 
(39) 

(40) 

(41) 



Topology C is shown in Fig. 6C; its partition functions 



Topology E is shown in Fig. 6E; its partition functions 
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2. Stochastic description: Linear noise approximation 



7 A 



7 A 
Z off 



1 + 



Z» = 



7 B 



7 B 



7 G 



1 + W OlV^B + W °2 q fcB 



(42) 
(43) 

(additive), (44) 



+( w oi + Wo 2 )wf 2 Q 

+ w oi w 02 w 12 ( 1-^b-^b (multiplicative), (45) 
a b 



w 



B 



a 



12 iff K 2 ' 



q + w£ 2 q 



(46) 
(47) 
(48) 



The linear noise approximation (LNA) is a second- 
order expansion of the master equation made under the 
approximations that (i) mean protein numbers are large 
and (ii) fluctuations are small compared to means. Un- 
der the LNA, the steady state solution to the master 
equation is a Gaussian distribution: the first-order terms 
recover the deterministic system and thus provide the 
means, and the second-order terms yield an equation for 
the covariance matrix. Comprehensive discussions of the 
linear noise approximation can be found in [20, 30, 31]. 

Under the LNA the steady state distribution over each 
species' protein number is a Gaussian expansion around 
the deterministic mean given by the steady state of Eqns. 
6-8. The covariance matrix S is determined from model 
parameters by solving the Lyapunov equation 



(57) 



,m + 3J 1 + D = 0, 
where J is the Jacobian of the system (Eqn. 9) and 



R A (a + A) 

D = \ R B {P + B) j ("><S) 
R G {j + G), 



Topology F is shown in Fig. 6F; its partition functions 
are 



Z± 



9 + W 0l9j^A 
„A 1 „„A 



a 



yA . 



+K+«^K«p^f (additive), (49) 
b 



q + w 01 q 



is an effective diffusion matrix. We solve Eqn. 57 numer- 
ically using MATLAB's lyap function. 

The deterministic steady state and the covariance ma- 
trix are computed four times (once in each of the chemi- 
cal input states c); the lower-right terms of each provide 
the mean and variance, respectively, of each (Gaussian) 
output distribution p{G\c). The input-output mutual 
information is computed directly from the distributions 
p(G\c), as described next. 



7 A - 



Zl = 



z B 



off 
G 



7 G 



+W, 
1 + 



01^02^129^4 j^j (multiplicative), (50) 
a b A a b 



(51) 



+O01 + w 02)Wi 2 q^B -j^b (additive), (52) 



1 + W 01<1-^B + W 02l^B 



\-w^w^ 2 wf 2 q 
a b 



01^02 <"i2y^s j^b 



(multiplicative), (53) 



1 + 



q + w% 2 q 



L! ^ 
b 



w 



12 ISB isB ' 



1 + 



(54) 
(55) 
(56) 



C. Information theory 

In this study we make use of a central measure from 
information theory: mutual information (MI). MI is a 
fundamental measure of the strength a relationship be- 
tween two random variables. More precisely, it measures 
the reduction in entropy of one variable given measure- 
ment of the other. MI captures correlation between two 
random variables even when a relationship exists that is 
nonlinear (unlike, e.g., the correlation coefficient) or non- 
monotonic (unlike, e.g., Spearman's rho). It has found 
wide use in the study of biological networks both as a sta- 
tistical measure [35, 36] and as a measure of functionality 
in the presence of noise [16-18]. For a comprehensive re- 
view of information theory we refer the reader to [37] . 

In this study we employ MI in two separate contexts: 
(i) as a measure of network functionality and (ii) as a 
measure of statistical correlation. In the first context, 
we optimize the MI between the chemical input state and 
the concentration of the output protein. In the second 
context, we compute, for a given topological feature (e.g. 
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number of up-regulations) , the MI between the value that 
the feature takes in a network and the function that the 
network performs. We here describe these computations 
in turn. 

MI is the average of the log of a ratio. The average is 
over the joint probability distribution between two ran- 
dom variables, and the ratio is that of the joint to the 
product of the marginal distributions. For a discrete 
variable, i.e. the chemical input state c, and a contin- 
uous variable, i.e. the output protein concentration G, 
MI takes the form 



I[ P (c,G)]=J2 / dGp(c,G)\og 2 



p(c,G) 
'p(c)p(Gy 



(59) 



where the log is taken in base 2 to give / in bits. Noting 
that p(c,G) = p(G\c)p(c) and p(G) — J2 c p( c >G) allows 
one to write the MI as 



I[ P (c,G)]=J2 J dG P (G\c)p(c) log. 



p(G\c) 



2 Y. cl p{G\d)p{c>y 

(60) 

i.e. entirely in terms of the conditional output distribu- 
tions p(G\c), obtained via the linear noise approximation 
(see above), and the input distribution p(c), which we 
take as p(c) = 1/4 for equally likely chemical input states. 
Eqn. 60 is integrated numerically during the optimization 
using MATLAB's quad. 

In the context of correlating, for a topological feature 
/i, the feature value v M with the observed function r, since 
both are categorical (and therefore discrete) variables, MI 
takes the form 



'b(w)] = £p(w)i°g|^. (6i) 



Here, the joint distribution p(v^,,r) is computed from 
the optimization data according to Eqn. 2, and the 
marginal distributions are trivially obtained as p(v^) — 

ErPKo r ) and p( r ) = E„„p(<v> r )- 

Additionally we note that MI is again used in the con- 
text of statistical correlation to compute a feature adja- 
cency matrix (see Sec. VF). That is, the MI between the 
values of a feature \x and a feature v is given by 



I\p{vn,v v )] 



(62) 



where p(v^ 7 v„) is computed directly from the network 
set. 



D. Optimization 

Optimization of input-output information I[p(G, c)] 
(Eqn. 60) over model parameters is done numerically 
using MATLAB's fminsearch. Each optimization is 
performed at constrained average protein number N = 



Parameter 


Bounds 


Promoter strengths, s 

Interaction strengths, w > 1 (up-regulation) 
Interaction strengths, w < 1 (down-regulation) 
Binding constants, K 
Leakiness, q 

Scaling factors, x > 1, y > 1 
Degradation rates, Ra, Rb 
Degradation rate, Rg (fixed) 


1Q -4 _ 1Q 6 

1.05- 10 2 
10" 2 -0.95 
10 _1 - 10 2 
10~ 1() - 10~ 2 
1.1 - 10 4 
10~ 7 - 10° 
4- 10- 4 



TABLE II: Bounds from which parameters are drawn to ini- 
tialize optimization. 



(A + B + G)/3 and average timescale separation T = 
[(Ra + Rb) fA/ Rg by maximizing the quantity 



L = I\p(G, c)] — r]N — kT 



(63) 



for various values of the Lagrange multipliers r\ and k 
(Rg is fixed). The optimization is initialized by sampling 
uniform-randomly in the logs of the parameters; bounds 
from which initial parameters are drawn are given in Ta- 
ble II. 



E. Robustness of p ranking 

The correlation between topological feature value w M 
and network function r is measured for each feature p, 
using a normalized mutual information . This measure 
is a function of the joint distribution p(v^,r), whose com- 
putation (Eqn. 2) depends on two distributions which we 
take to be uniform: (i) p(n), the probability of observing 
each network n, and (ii) p($\ri), the probability of observ- 
ing each optimally functional point i) in the parameter 
space of network n. Here we show that the ranking of the 

is robust to perturbations in the uniformity of each of 
these distributions. 



1. Perturbing p(n) 

The uniformity of p(n) is perturbed by artificially set- 
ting p(n) oc (u n ) e , where u n is a vector of random num- 
bers and e tunes the entropy of the distribution H[p(n)]. 
That is, e = recovers the maximum-entropy (uniform) 
distribution, while e — > oo produces the zero-entropy so- 
lution p(n) — > S(n, argmaxu„) (where 5 is the Kronecker 
delta). Fig. 7A plots the as a function of the entropy 
H[p(n)\. As seen in Fig. 7A, the ranking of the top 4 
features is preserved under ~15% perturbations in the 
entropy, and that of the top 3 features is preserved under 
^30% perturbations. This demonstrates that the fea- 
ture ranking is considerably robust to perturbations in 
the uniformity of p(n). 
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A 10° 




Entropy, H[p(n)] (bits) 



B io° 




Entropy, H[p(8)] (bits) 

FIG. 7: Values of the correlation measure p for each of the 
17 features as a function of the entropies (A) H[p(n)] and 
(B) if[p(#|n)]. Each point represents the average of 8 trials. 
In B, the dashed vertical line shows the entropy under the 
assumption that p($o\n), n °t p($|n), is uniform. 



A 




MDS coordinate 2 

MDS coordinate 1 



FIG. 8: Identifying non-redundant topological features. (A) 
Feature adjacency matrix (Eqn. 65). (B) Features plotted 
according to correlation measure p and the coordinates of a 
two-dimensional scaling based on the adjacency matrix in A. 
Features are numbered as in Table I. 



2. Perturbing p(J)\n) 



Non-redundant features 



The uniformity of p("d\n) for each network n is per- 
turbed via the same procedure described in the previous 
section. Fig. 7B plots the p^ as a function of the entropy 
of p($) = ^2 n p{^\n)p{n)^ where p{n) here is uniform. 
As seen in Fig. 7B, the ranking of the top 7 features is 
preserved under ~40% perturbations in the entropy of 
indicating that the feature ranking is very robust 
to perturbations in p(i?|n). 

In this case we also have an independent entropy scale, 
given by the fact that we may decompose p(i?|n) as 



p(ti\n)=Y,PWo,n)p(tf \n), 



(64) 



where $rj is the parameter setting that initializes an op- 
timization and p(?9|?9o! n ) is determined by the optimiza- 
tion itself. If we assume uniformity of p(&o\n), instead of 
p(D\n), then p(&\n) is computable from the numbers of 
times the optimization converges repeatedly on each lo- 
cal optimum The entropy in this case is 13% different 
from that of the uniform distribution, and the ranking of 
p is almost entirely unchanged (Fig. 7B). Fig. 7B demon- 
strates that the results are not sensitive to whether one 
takes the distribution of initial parameters or the distri- 
bution of optimal parameters to be uniform. 



To interpret which features are associated with which 
sets of realizable functions, it is useful to group nearly 
identical features together and use only the feature which 
is most informative about function (highest in p) as the 
exemplar among each group. To quantify redundancy 
among features, we compute the MI between each pair 
of features and normalize by the minimum entropy to 
produce a weighted adjacency matrix 



(65) 



(Fig. 8A). We then use the adjacency matrix as the basis 
for multidimensional scaling, as described below. 

Multidimensional scaling (MDS) describes a class of 
techniques used to visualize proximities among data in a 
low-dimensional space. One of the most common tech- 
niques (also called principal components analysis when 
applied to a correlation matrix) is to use as the low- 
dimensional coordinates the eigenvectors of the adja- 
cency matrix corresponding to the largest-magnitude 
eigenvalues. Data points with high mutual proximity 
then tend to be grouped together along these coordi- 
nates. Fig. 8B and Fig. 5 show the application of this 
technique in two and one dimensions, respectively, to the 
adjacency matrix in Fig. 8A, revealing groups of similar 
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features. Plotting the feature-function correlation mea- 
sure p along the vertical axis in each case makes apparent 
the most informative feature in each group (i.e. feature 
1 in one group and feature 4 in a second group). 



G. Direct functionality: validation of known 
analytic result 

In previous work [14] we show analytically that net- 
works in which each species is regulated by at most one 
other species perform only "direct" functions, in which 
the sign of the effect of an input species on an output 
species depends only on the direct path from input to 
output, even when there is feedback. This analytic re- 
sult is here validated by our statistical approach. 

In the context of our setup (see Fig. 1A), the direct 
paths from the inputs (the chemical inhibitors labeled by 
x and y) to the output (the fluorescent protein G) involve 
only the forward regulatory edges A — »• B and B — > G. 
Therefore considering only those networks in which each 
species is singly regulated (Fig. 6A-B), the analytic re- 
sult predicts that the signs of the forward edges uniquely 
determine the type of function performed. Plotting the 
conditional distribution p(r\v) for the feature 'signs of 
the forward edges' using only data from these networks 
confirms that this is indeed the case (Fig. 9). Accord- 
ingly the correlation for this distribution is p — 1, the 
maximum possible value. 

The functions performed at each of the feature values 
in Fig. 9 can be understood intuitively. For example, 
in networks with the last feature value A — > B — > G, 
inhibition of A and of B will both reduce the expression 
of G, such that the state in which both small molecules 
are present (++) produces the lowest-ranked output, and 
conversely, the state in which both small molecules are 

absent ( ) produces the highest-ranked output; one 

may verify by inspection that functions 7 and 8 are the 
two that satisfy these criteria. The correspondence of 
the other function pairs to feature values can be similarly 
understood in terms of the edge signs. 



H. XOR functionality: analysis 

As described in the main text, XOR functions satisfy 
one or both of two properties: 

XOR property I : sign (dG/dx) depends on y, (66) 
XOR property II : sign (dG/dy) depends on a;. (67) 

To analytically understand the observed XOR function- 
ality, we here calculate the derivatives dG/dx and dG/dy 
from the steady state of the deterministic system (Eqns. 
6-8). We further show how the forms of these derivatives 
support the observations that (i) all XOR functions sat- 
isfying property I are performed by networks in which 
species B is autoregulated, and (ii) no functions satisfy- 
ing property II are observed. 
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FIG. 9: Conditional distribution showing the probability of a 
particular input-output function given the value of the topo- 
logical feature 'signs (up- or down-regulating) of the forward 
regulatory edges,' using only data from networks in which 
each species is singly regulated (Fig. 6A-B). Only direct func- 
tions are performed, as defined in the text. 



1. Calculating the derivatives 

The steady state of Eqns. 6-8, with all functional de- 
pendencies made explicit, is 



A = a[a(A,x),b(B,y)}, 
B = p[a(A,x),b(B,y)], 
G = j[b(B,y)}, 



(68) 
(69) 
(70) 



where a(A,x) = A/x and b(B,y) — B/y. The output G 
depends on the input x only through 6, and b depends 
on x only through B, i.e. 



dG _dj _ d-y db dB 
dx dx db dB dx 

The dependencies of A and B on x are coupled: 



(71) 
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(73) 
(74) 
(75) 
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Eqns. 73 and 75 form an algebraic system of equations 
in the variables dA/dx and dB/dx, whose solution is 



where 



dA 

dx 



dB 

dx 



A = 



1 

^A 



1 



da da 
da dx 
da db d/3 da 
db dB da dx 
1 dp da 
■A da dx ' 



dp db 
IttdB 



(76) 
(77) 



da db 
~dbd~B 
( da da 
~ \da~dA 
da d/3 
dBdA 



dp da 
da~dA 



- 1 

da 
dA 



1 



dJ3_db_ 
dbdB 

dp_ 
dB 



- 1 



1 



(78) 
(79) 



is the determinant of the Jacobian of the dynamical 
system and is always negative at stable fixed points 
(Eqn. 11). Substituting Eqn. 77 into Eqn. 71 and us- 
ing d^f/dB — (d-f/db)(db/dB) yields Eqn. 4 of the main 
text, 



dG_ 

dx 



1 da dp <9 7 
^d^dadB' 



(80) 



The output G depends on the input y through 6, which 
depends on y cither indirectly through B or directly i.e. 



dG 
dy 

As on x, 
dA 
dy 



dB 
dy 



db_dJ3_ db\ 
dB dy dy J 



(81) 



d 7 07 db c?7 
dy db dy db 

the dependencies of A and B on y are coupled: 
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Eqns. 83 and 85 can be solved to yield 



dA 
dy 
dB 
dy 



1 da db 
^A <% dy' 



(82) 
(83) 
(84) 
(85) 

(86) 



1 - 



da da 
~da~dA 



(87) 



1 [dp db 
^A [dbdy 
dp da da db 
da dA db dy 

where A is the determinant as in Eqn. 79. Substituting 
Eqn. 87 into Eqn. 81 and simplifying gives Eqn. 5 of the 
main text, 

dG _ 1 / da\ dbdj 
_ -A \ ~ dAJ dydb' 

where da/dA = (da / da) (da /d A). 



(88) 



2. Type-I XOR functions require autoregulation of B 

Type-I XOR functionality (Eqn. 66) requires the sign 
of the derivative dG/dx (Eqn. 80) to depend on y. Here 
we go through each term in Eqn. 80 and conclude that 
only the third can change sign. The first term in Eqn. 80, 
l/(— A), is always positive because the determinant A is 
equal to the product of the three eigenvalues of the Jaco- 
bian of the deterministic system, which at a stable fixed 
point are all negative (Eqn. 11). The second term in Eqn. 
80 is da/dx = —A/x 2 , which is always negative, corre- 
sponding to the inhibitory effect of the small molecule x 
on transcription factor A. The fourth term in Eqn. 80 is 
d-f/dB = (dj/db)(db/dB). The factor db/dB = 1/y is 
always positive, and the factor dj/db is of unique sign 
because 7(6) is monotonic, as shown below. This leaves 
only the third term, dp/da, which can change sign if and 
only if B is autoregulated, as discussed below. 

Under our model a regulation function with only one 
argument is monotonic, which is consistent with the 
interpretation of the edge being either up- or down- 
regulating. For example, the regulation function corre- 
sponding to the edge B — >• G in all networks is (see Eqns. 
14-15) 



7(6) = 



SG 



z9t 



(89) 



where Zg and Z^ s are given by, e.g., Eqns. 20-21. The 
derivative of this function with respect to its argument 
is 



d"f 
d6 



R G Z 2 G 



( dZg 
\ db 



J off 



7 G dZg 
J ° n db 



(90) 



where Zq = Zg + Zg. Upon differentiating and in- 
serting Eqns. 20 and 21, all dependence on b inside the 
parentheses cancels, leaving 



db ~ R G K§Zl 



(«>02 



!)• 



(91) 



Eqn. 91 confirms that 7(6) is monotonic, with Wq 2 > 1 
corresponding to up-regulation, and Wq 2 < 1 correspond- 
ing to down-regulation. 

If species B is not autoregulated, then it is only reg- 
ulated by species A; the regulation function then only 
has one argument, i.e. P(a,b) = /3(a), and, as with 7(6) 
above, it is monotonic. Therefore without autoregula- 
tion of B, type-I XOR functionality is not possible. We 
now compute the derivative dp / da in the case of two ar- 
guments to analytically demonstrate the converse: that 
with autoregulation of B, typc-I XOR functionality is 
possible. 

As with Eqn. 90, the partial derivative of ft (a, b) with 
respect to a takes the form 



dp 
da 



sb fdzg B B dz& 

R B Z% \ da off on da , 



(92) 
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where Zb = Zi. 



Arff' Z on 



(with B autoregulated) is 



given by, e.g., Eqn. 30 for additive interaction and Eqn. 
31 for multiplicative interaction, and Z^ s is given by, e.g., 
Eqn. 32. Upon differentiating and inserting the expres- 
sions for Z^ n and Z^ s , all dependence on a inside the 
parentheses cancels (for both additive and multiplicative 
interaction), leaving 



l-idmi {c '" 2+c ' b+c ^ 

where for additive interaction, 



Co 



01 



(93) 



(94) 



Cl = J^B Kl - <2 - <1 + « + < 2 ) Wit] , (95) 



Co 



"01^2, 



and for multiplicative interaction, 



Co = wg 



1, 



(96) 



(97) 



Ci = -^[w^-wg-wg+w^wgwg], (98) 



Co 



1 



(99) 



Eqn. 93 is the product of a positive term and a quadratic 
function of b = B/y. It is straightforward to demon- 
strate in both the additive and multiplicative cases (e.g. 
by sampling numerically) that for positive w^, Wq 2 , and 
W12 the quadratic function can have positive, negative, 
or complex roots. When at least one root is positive, 
the sign of dpi /da changes at positive B/y, i.e. the sign 
can depend on y. Since dG/dx is proportional to 8(3/ da 
(Eqn. 80), this enables type-I XOR functionality. This 
analysis suggests inspection of the parameters themselves 
obtained via optimization; doing so, we observe that the 
vast majority of observed XOR functions results from op- 
timal parameter values for which there exists a positive 
root in the range < B/y < ~100, which is precisely the 
range of protein numbers in which our optimal solutions 
lie. 

To summarize, nonmonotonicity in the regulation of 
species B, which can occur only when B is autoregulated, 
produces the observed XOR functions. 



of the terms in Eqn. 88 are of unique sign: the terms 
l/(— A) and d"f/db are positive and of unique sign respec- 
tively, as discussed in the previous section; and the term 
db/dy = —B/y 2 is always negative, corresponding to the 
inhibitory effect of the small molecule y on transcription 
factor B. This leaves only the term (1 — 8a/8A), which, 
as discussed below, for four of the network topologies is 
provably positive at stable fixed points, and for the other 
two network topologies is observed to be positive for all 
optimal solutions. 

For topologies B and E (Fig. 6), in which da/dA = 0, 
the term (1 — da/dA) = 1 is clearly positive. For topolo- 
gies A and C, in which da/dB = 0, the first eigenvalue of 
the Jacobian (Eqn. 10) reduces to Ai = da/dA — 1; since 
this must be negative for stability, the term (1 — da/dA) 
is always positive for these topologies as well. This leaves 
only networks with topology D or F. 

In networks with topology D or F, It is unclear whether 
type-II XOR functions are analytically forbidden or sim- 
ply exceedingly improbable for optimally informative pa- 
rameters. Some analytic constraints can be obtained by 
the facts that since the real parts of Ai and A 2 (Eqn. 10) 
are negative at stable fixed points, their sum must be 
negative and their product must be positive, i.e. 



< -(A1 + A2) 



1 - 



da 
dA 



1 - 



< AxA 2 = [1- — ] (1 



da 



df3_ 

o~b 



df3_ 

dB 
dadf3_ 
dBdA 



(100) 
(101) 



If (1 - d/3/dB) < 0, then Eqn. 100 implies that (1 - 
da/dA) > 0, forbidding type-II XOR functions. If on 
the other hand (1 - d(3/dB) > 0, Eqn. 100 docs not 
restrict the sign of (1 — da/dA), but Eqn. 101 implies 



da 
dA 



> 



(da/dB)(d/3/dA) 
1 - dp/dB ' 



(102) 



Although in this last case it is not known whether the 
right-hand side is constrained to be positive, empirically 
the term (1 — da/dA) > is observed to be positive for 
all optimal solutions, even over wide variations in the 
orders of magnitude of each of the optimal parameters 
across solutions. 
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